\section{Our technique}

We illustrate our technique with a simplified version of the function
\texttt{find}.  Recall that this function searches a sequence for the
first occurrence of some item passed as an argument, and that the
behavior can be altered as usual with parameters for determining the
comparison function, a key function to apply to each element, the
direction of the search, and the interval to search.

In the begining of this section, our version is simplified in the following way:

\begin{itemize}
\item The only type of sequence handled is \texttt{vector}.
\item The \texttt{test} function is fixed to be \texttt{eql}.
\item The interval to search is the entire vector.
\item The key function to apply to each element is fixed to be
  \texttt{identity}.
\item The search is from the beginning of the vector.
\end{itemize}

In the general version of the \texttt{find} function, all these
parameters must of course be taken into account, and then our
technique becomes even more applicable and even more important.  But
the general version does not require any additional difficulty beyond
what is needed for the special case, and the general case would only
clutter the presentation, hence the special version which we will call
\texttt{find-vector}.

Clearly, in terms of portability and maintainability, it would be
desirable to implement \texttt{find-vector} like this:

{\small\begin{verbatim}
(defun find-vector-1 (item vector)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (loop for index from 0 below (length vector)
        for element = (aref vector index)
        when (eql item element)
          return element))
\end{verbatim}}

Unfortunately, most implementations would have difficulties optimizing
this version, simply because the exact action required by the function
\texttt{aref} depends on the \emph{element type} of the vector, and
whether the vector is a \emph{simple-array}.  This information is
clearly \emph{loop invariant}, but most compilers do not contain
adequate optimization passes in order to duplicate and specialize the
loop.

To improve code layout, in what follows, we assume the following type
definition:

{\small\begin{verbatim}
(deftype simple-byte-vector ()
  `(simple-array (unsigned-byte 8)))
\end{verbatim}}

To help the compiler, one can imagine a version like this:

{\small\begin{verbatim}
(defun find-vector-2 (item vector)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (if (typep vector 'simple-byte-vector)
      (loop for index from 0 below (length vector)
            for element = (aref vector index)
            when (eql item element)
              return element)
      (loop for index from 0 below (length vector)
            for element = (aref vector index)
            when (eql item element)
              return element)))
\end{verbatim}}

Here, we have illustrated the specialization with a non-standard
element type, so that either an implementation-specific type predicate
has to be used, or (as in our example) a call to \texttt{typep} is
needed.

Whether a local declaration of the type of the vector in addition to
the call to \texttt{typep} is required for the compiler to optimize
the call to \texttt{aref} is of course implementation specific.
Similarly, whether a special version (possibly
implementation-specific) of \texttt{aref} is required also depends on
the implementation.

Not only do we now have implementation-specific code, but we also
have a maintenance problem.  The loop will have to be duplicated for
each sequence function, and for every specific type that the
implementation can handle.  This duplication requires separate tests
for each case so as to guarantee as much coverage as possible.  Given
the number of combinations of types, plus the additional parameters we
have omitted, this requirement quickly becomes unmanageable.

To solve this problem, we introduce a macro \texttt{with-vector-type} that
abstracts the implementation-specific information and that takes care
of duplicating the loop:

{\small\begin{verbatim}
(defmacro with-vector-type (vector-var &body body)
  `(macrolet ((vref (array index)
                `(aref ,array ,index)))
     (if (typep ,vector-var 'simple-byte-vector)
         (locally (declare (type simple-byte-vector
                                 ,vector-var))
           ,@body)
         (progn
           ,@body))))
\end{verbatim}}

Here, we have introduced a new operator named \texttt{vref} in the
form of a local macro, and that is globally defined to expand to
\texttt{aref}.  This global definition works for SBCL, but different
implementations may need different expansions in different branches.
For example, some implementations might need for the macro to expand a
call to \texttt{sbit} in a branch where the vector is a simple bit
vector.

We have also introduced a local declaration for exact type of the
vector in the specialized branch.  Each implementation must determine
whether such a declaration is necessary.

Using this macro, we can now write our function \texttt{find-vector}
like this:%
\footnote{In our examples, as in our real code, we frequently use a
  value of 0 for the \texttt{safety} optimize quality.  However, this
  is the case only for auxilary functions called from protocol
  functions that have verified all parameters before calling such an
  auxiliary function.  Overall, the combination is therefore still
  safe as seen by the application programmer.}

{\small\begin{verbatim}
(defun find-vector-4 (item vector)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (with-vector-type vector
    (loop for index from 0 below (length vector)
          for element = (vref vector index)
          when (eql item element)
            return element)))
\end{verbatim}}

We notice a couple of essential properties of this code:

\begin{itemize}
\item The exact set of available vector types in the implementation is
  hidden inside the macro \texttt{with-vector-type}, which would have
  a different version in different \commonlisp{} implementations, but
  there will be a single occurrence of this macro for all the sequence
  functions.
\item The maintenance problem resulting from duplicating the loop has
  disappeared, because the macro \texttt{with-vector-type} is in
  charge of the duplication, making it certain that the copy is exact.
\end{itemize}

For a second example in the same spirit, consider how the keyword
parameter \texttt{end} is handled when the sequence is a list.
Again, we illustrate our technique with a simplified version of the
\texttt{find} function.

As for the previous example, in terms of portability and
maintainability, it would be desirable to implement
\texttt{find-list} like this:

{\small\begin{verbatim}
(defun find-list-1 (item list &key end)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (loop for index from 0
        for element in list
        when (and (not (null end)) (>= index end))
          return nil
        when (eql item element)
          return element))
\end{verbatim}}

As with the previous example, most \commonlisp{} implementations would
have difficulties optimizing the code, even though the test
\texttt{(null end)} is loop invariant.  We solve this problem by
introducing the following macro:

{\small\begin{verbatim}
(defmacro with-end (end-var &body body)
  `(if (null ,end-var)
       (progn ,@body)
       (progn ,@body)))
\end{verbatim}}

The code for \texttt{find-list} can now be written like this:

{\small\begin{verbatim}
(defun find-list-2 (item list &key end)
  (declare (optimize (speed 3) (debug 0) (safety 0)))
  (with-end end
    (loop for index from 0
          for element in list
          when (and (not (null end)) (>= index end))
            return nil
          when (eql item element)
            return element))
\end{verbatim}}

We notice that the \texttt{loop} body looks the same as in the
portable and maintainable version shown before, and the only
difference is that the loop has been wrapped in a call to the macro
\texttt{with-end}.  A good compiler will now specialize each of the
two copies of the loop introduced by the \texttt{with-end} macro
according to the value (i.e., \texttt{nil} or not) of the variable
\texttt{end}.  In the first copy, the entire first \texttt{when}
clause of the loop will be removed.  In the second copy, the test in
the first \texttt{when} clause of the loop is reduced to the
comparison between \texttt{index} and \texttt{end}.
